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Statement of Focus 



The Wisconsin Research and Development Center for Cognitive Learning 
focuses on contributing to a better understanding of cognitive learning by chil- 
dren and youth and to the improvement of related educational practices. The 
strategy for research and development is comprehensive. It includes basic re- 
search to generate new knowledge about the conditions and processes of learn- 
ing and about the processes of instruction, and the subsequent development of 
research-based instructional materials^ many of which are designed for use by 
teachers and others for use by students. These materials are tested and refined 
in school settings. Throughout these operations behavioral scientists, curricu- 
lum experts, academic scholars, and school people interact, insuring that the 
results of Center activities are based soundly on knowledge of subject matter 
and cognitive learning and that they are applied to the improvement of educa- 
tional practice. 

a his Technical Report is from the Quality Verification Program and from the 
Project on the Structure of Concept Attainment Abilities in Program 1 . The Qual- 
ity Verification Program assisted in developing tests to measure concept achieve 
ment and identifying reference tests for cognitive abilities, while the Concept 
Attainment staff took primary initiative in identifying basic concepts in math- 
ematics at intermediate grade level. The tests will be used to study the rela- 
tionships among cognitive abilities and learned concepts in various subject 
matter areas. The outcome of the Project will be a formulation of a model of 
structure of abilities in concept attainment in a number of subjects, including 
social studies, science, and language arts, as well as mathematics. 
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Abstract 



Test development efforts for constructing 12 items to measure 
achievement of each of 30 selected mathematics concepts are de- 
scribed. Item and total score statistics for data collected on 196 
•girls who had Just completed the fifth grade during early summer of 
1970 and 195 boys who had just begun the sixth grade during the 
fall of 1970 are presented and discussed. 
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Introduction 



The primary objective of the project en - 
titled "A Structure of Concept Attainment Abil- 
ities" (hereafter referred to as the CAA Project) 
is to formulate one or more models or struc- 
tures of concept attainment abilities, and to 
assess th.ir consistency with actual data. 
The major steps for attaining this primary 
objective were taken to be: 

1 . To identify basic concepts in lan- 
guage arts, mathematics, science, 
and social studies appropriate at the 
fourth grade level, 

2. To develop tests to measure achieve- 
ment of these concepts, 

3. To identify reference tests for cogni- 
tive* abilities, and 

4. To study the relationships among 
learned concepts in these four sub- 
ject matter fields and the identified 
cognitive abilities. 

This paper describes the test development 
efforts for measuring achievement of selected 
concepts in mathematics; thus, it is a report 
of one aspect of Step 2. As such, it will 
include descriptive item and test statistics 
for the tests developed. The items can be 
found in "Items to Test Level of Attainment 
of Mathematics Concepts by Intermediate- 
Grade Children" (Romberg & Steitz, in press). 

Concepts may be defined in one or more 
of four ways: (a) structurally, in terms of 
perceptible or .readily specifiable properties 
or attributes; (b) semantically, in terms of 
synonyms or antonyms; (c) operationally, in 
terms of the procedures employed to distin- 
guish the concept from other concepts; or 
(d) axiomatically, in terms of logical or nu- 
merical relationships (Klausmeier, Harris, 
Davis, Schwenn, & Frayer, 1968) . "A con- 



cept exists whenever two or more distinguish- 
able objects or events have been grouped or 
classified together and set apart from objects 
on the basis of some common feature or prop- 
erty of each" (Bourne, 1966, p. 1). The con- 
cept of Bourne's definition might be called a 
classificatory one and seems to be the same 
as the structural type discussed by Klaus- 
meier, et al. (1968). This is the type of con- 
cept with which this project is concerned, 
and such a definition of a concept served as 
the basis for selection and analysis of sub- 
ject matter concepts. 

Many different types of performance 
might be taken as the critical evidence that 
a student does or does not understand a given 
concept. Thus, as a part of this project it is 
necessary to have a schema for measuring 
understanding nf concepts . Such a schema 
was developed by Frayer, Fredrick, and 
Klausmeier (1969) and was used by the CAA 
Project to assess concept attainment. The 
"Schema for Testing the Level of Concept 
Mastery" consists of 13 types of questions, 
each involving a different task required of the 
examinee. The schema also allows for selec- 
tion of an answer (multiple -choice type ques- 
tions) or for production of an answer (comple- 
tion type questions) • It was decided to use 
the first 12 tasks and a multiple-choice format 
for this project. The 12 tasks of the schema 
which were used are: 

1. Given the name of an attribute, 
select an example of the attribute. 

2. Given an example of an attribute, 
select the name cf the attribute. 

3. Given the name of a concept, select 
an example of the concept . 

4. Given the name of a concept, select 
a nonexample of the concept. 
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5. Given an example of a concept, se- 
lect the name of the concept. 

6. Given the name of a concept, select 
the relevant attribute. 

7. Given the name of a concept, select 
the irrelevant attribute. 

8. Given the definition of a concept, 
select the name of the concept. 

9. Given the name of a concept, select 
the definition of the concept. 

10. Given the name of a concept, select 
the supraordinate concept. 

11. Given the name of a concept, select 
the subordinate concept. 

12. .Given the names of two concepts, 
select the relationship between them. 

Single- or compound-word classificatory 
concepts (those that are defined by attributes) 
in mathematics subject matter at the fourth 
grade level were identified. This task was 
subdivided into four steps: 

1 • Identification of the major areas 
within the subject matter of math- 
ematics, 

2. Selection of three of these major 
areas to be studied, 

3. Identification of classificatory con- 
cepts within each of these three 
major areas, and 

4 . Random sampling of ten concepts 
from those identified for each of the 
three major selected areas. 

This yielded a total of 30 mathematics con- 
cepts to be studied by the project* A list is 
given in Table 1, by area, of the concepts 
identified. The areas are Sets, Division, and 
Expressing Relationships. In a pilot study, 
it was found that a very small percentage of 
mid-year fourth grade students could pronounce 
or render any meaning to nine of the concepts 
in the area of Divinion. They are algorithm, 
associative property, closure property, com- 



mutative property, density property, distribu- 
tive property, identity property, orde* property, 
and reciprocal property. These concepts were 
excluded from the random sampling procedure. 
A description of the procedures used to iden- 
tify these concepts can be found in "Selection 
and Analysis of Mathematics Concepts for 
Inclusion in Tests of Concept Attainment" 
(Romberg, Steitz & Frayer, in press). The 
researchers of Project 101, Situational Vari- 
ables and Efficiency of Concept Learning, 
developed a system for analyzing a concept 
in preparation for developing items to mea- 
sure the level of attainment of that concept 
(Frayer, Fredrick, & Klausmeier, 1969). 
Since the publication of that paper they, in 
cooperation with the researchers of the CAA 
Project, have refined their thinking and ad- 
vanced this system. The refinements are 
discussed in "A Structure of Concept Attain- 
ment Abilities: The Problem and Strategies 
for Attacking It" (Harris, Harris, Frayer, & 
Quilling, in press) . Briefly, a concept may 
be described in many ways — in terms of its 
criterial, relevant, and irrelevant attributes? 
its examples and N nonexamples; its supra- 
ordinate, coordinate, and subordinate hierar- 
chical relationships (theoretically determined); * 
and its lawful or other types of relationships 
to other concepts. Knowledge of each of these 
kinds of information may be tested to deter- 
mine a student's level of attainment of a 
, concept. An analysis, along these lines, 
of each of the 30 sampled mathematics con - 
cepts which are being studied can be found 
in "Selection and Analysis of Mathematics 
Concepts for Inclusion in Tests oi Concept 
Attainment" (Romberg, Steitz & Frayer, in 
press). 

Thus, using the analysis of a concept as 
the basis for appropriate content and the 12 
tasks of the schema as the basis for appro- 
priate tasks, 12 items, one for each of the 
12 tasks, were developed wherever possible 
for each of the 30 concepts. For seven of 
the concepts, no item was developed for 
Task 11, so there was actually a total of 
353' rather than 360 mathematics items for 
the purpose of measuring and assessing con- 
cept attainment in mathematics. The develop- 
ment of the items, along with item and total 
score statistics (for concepts and for tasks) 
obtained for them for fifth grade boys and 
girls, will be discussed in the following 
sections. 
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Table 1 

Mathematics Concepts Categorized by Area 



Sets 


Division 


Expressinq Relationships 


Cardinal Number 


Algorithm 


Area 


♦Disjoint Sets 


Associative Property 


♦Average 


Element 


Closure Property 


Hfl 7 pna 1 C 1 7 c t on' 


* Empty Sets 


Common Denominator 


Estimation 


*Equal Sets 


Commutative Property 


Gen6rati.no Sentpnrp? 


♦Equivalent Sets 


♦Denominator 


♦Graph 


Intersection 


Density Property 


Lenoth 


♦Line 


Distributive Property 


Liquid 


Line Segment 


♦Division 


jviaLiicmaiiccxi oemences 


Non-Disjoint Sets 


♦Factor 


♦Measurement 


Ordered Pairs 


♦Fraction 




♦Parallel Lines 


Identitv Prnnprtv 


Darf 1 a 1 Cum o 

rarllal oUITlS 


♦Plane 




^.ricice Holder 


♦Point 


♦Multiplication 


♦Place Value 


Set 


♦Numerator 


Range 


Sets of Numbers 


Order Property 


Round Numbers 


Sets of Points 


Partial Product 


♦Solution Set 


Skew 


Partial Quotient 


♦Standard Unit 


♦Subset 


Partitioning 


♦Statement 


♦Subtraction - A way of 


♦Product 


♦Weight 


looking at addition 


♦Quotient 


Triangular Numbers 


Reciprocal Property 




Union of Sets 


♦Remainder 




Universal Set 






Whole Number 







♦ Concepts randomly selected to be tested. 
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Procedures 



This section contains a discussion of 
the item development procedures used in- 
cluding initial item construction and revision 
of those items based on item analysis results. 
Also included is a discussion of the data 
collection procedures, subjects, and treat- 
ment of the data* 



Test Development 

One item for each oi rh<? I 2 tasks was 
generated for each of the 30 selected con- 
cepts. If one looks at the tasks being used 
to measure understanding of the concept, it 
is apparent that there can be more than one 
item generated for at least some of the tasks. 
For erample, a Task 1 type item could be 
constructed to measure understanding of each 
of many relevant attributes for most concepts. 
For this project, it was decided to construct 
Just one multiple -choice item for each task 
for each concept. This made it necessary 
to have bases for making choices when such 
choices were necessary. These bases con- 
sisted of principles for selecting attributes, 
relationships, incorrect choices, etc. A 
discussion of such bases may be found in 
"A Structure of Concept Attainment Abilities: 
The Problem and Strategies for Attacking It" 
(Harris et al., in press). 

General procedures for item construction 
included initial item generation by a subject 
matter specialist item writer; critique of the 
items by a committee composed of the item 
writers from each of the four subject matters 
being studied(the other three are language 
erts, science, and social studies), an ex- 
perienced elementary school teacher special- 
izing in reading, and a measurement special- 
ist; and final critique by the subject matter 
principal investigator and a measurement 
specialist.— Concerns in the item construction 



process were readability, validity, and reliabil- 
ity. 



Readability 

It was intended that no student should be 
unable to answer a?: item correctly simply be- 
cause of inability to read the item. In wilting 
item^: very simple language was us»ad wherever 
possible. Several pilot studies concerned with 
the readability question were conducted , and 
two outside consultants expert in the testing 
and measurement fields were ask 3d to look at 
a sample of the items from the fcoint of view 
of readability for fifth graders. No significant 
differences were found among treatment groups; 
percentage of occurrences of subjects who 
could hot pronounce the word and did not know 
Hb r.ife&.'Ung when shown the concept labels, 
but did know its meaning when the ward was 
pronounced, was judged to be negligible; and 
the two outside consultants independently 
advised that there was no reading profttem with 
the items and that there should be no concern 
about administering them in the standard way 
in which the students read the items them- 
selves. The conclusion drawn from the results 
of the pilot studies and the consultants' opin- 
ions was that readability of the items was not 
a problem and under standard administration 
conditions would be satisfactory. For further 
information see Harris et al. (in press). 



Validity 

The content validity of each of the items 
was of immediate concern during item construc- 
tion; aspects of construct validity were to be 
probed later using duplicate-test construction, 
simplex analysis, and factor analysis of the 
results obtained using the content-valid items 
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constructed. 

Cur rent Validity , Each item was con- 
structed to meet the content and task specifi- 
cations set for It. The task required of the 
student by each item was specified by the 
schema adopted for use in measuring concept 
attainment. The concept name was given by 
the sampling process; the attributes, exam- 
ples, definition, and relationships associated 
with the concept name were defined by the 
prior. analysis of the concept. The content 
for each item was specified in this manner. 
The content specifications were not as pre- 
cise as the task specifications due to the 
necessity of choosing a single attribute to 
be tested for example and selecting the In- 
correct alternatives to be used In the multiple- 
choice questions. Systematic construction 
of alternate choices was used whenever pos- 
sible; for example, for an Item dealing with 
the operation of addition, the operations (or 
example of them) of subtraction, multiplica- 
tion, and division were used as Incorrect 
choices • 

To further ensure the content validity of 
the Items, two persons who were familiar 
with the schema for testing concept attain- 
ment, but were not involved in the Item develop- 
ment process, classified five random sets of 
72 items (12 Items for six concepts In each 
set) according to content and task. These two 
persons had the analyses of the concepts 
available. They were able to correctly clas- 
sify all but a few of the Items . Any questions 
they had about these few Items were mutually 
resolved among the subject matter principal 
Investigator, the measurement specialist, 
and themselves. 



Reliability 

Developing one item for each of the 12 
tasks for each of the 30 selected concepts 
yields a 12 (tasks) by 30 (concepts) matrix 
consisting of the score for each of the 360 
items, one for each cell of the matrix, for 
each Individual to whom the items were ad- 
ministered. Thus, a completely crossed 
design exists and two types of total scores 
can be secured from this matrix: a total score 
for each of the 30 concepts (totalled across 
tasks) and a total score for each of the 12 
tasks (totalled across concepts). Figure 1 
is an illustration of such a matrix. 

This design offers these alternatives: 
(a) use a total score of 360 items to analyze 
all Items against; (b) use 30 total scores, 
each for one concept and consisting of 1 2 



Items, to analyze the 12 task Items against; 
and (c) use 12 total scores, each for one 
task and consisting of 30 Items, to analyze 
the 30 concept Items against. The first alter- 
native was rejected since It assumes neither 
task nor concept variation Is oresent. A 
choice was not made between the next two 
alternatives. Instead, both were done. An 
Important theoretical problem of how to Item 
analyze a completely crossed design like 
this remains to be solved. 

Major concerns about reliability for the 
test development process were that Internal 
consistency reliability estimates for task 
scores (total of 30 Items across concepts) 
and concept scores (total of 12 Items across 
tasks) be high enough to warrant further study 
using such scores. It was recognized that 
there might be some contradictions In what 
was attempted. The Items were constructed 
to comply with the completely crossed design, 
30 concepts by 12 tasks. One major objective 
of the entire project Is to determine the dimen- 
sionality of the selected mathematics con- 
cepts and of the tasks when using mathematics 
content. If either or both of these are not 
unldimenslonal, then an internal consistency 
reliability estimate based upon Items measur- 
ing aspects from the multidimenslons would 
reflect this; the more dimensions Present and 
the more uncorrected they are, the lower the 
Internal consistency estimate. Recognizing 
this, and not being able to study the dimen- 
sionality of the two modes (concepts and tasks) 
until after the items were developed, pilot 
studies were conducted using the items for 
some of the concepts for the 12 tasks. As 
will be pointed out later, evidence indicates 
that sufficiently reliable scores can be ob-. 
tained for both task scores and concept scores. 

Item Revision 

If one looks at. the 1 2 tasks for a single 
concept it becomes quite apparent that there 
may be a strong learning effect as one attemots 
to answer the items. The name of the con- 
cept appears in every item, exceot for the 
first two which deal with an attribute of the 
concept, either in the stem or as a possible 
choice. This makes a random presentation 
of the items desirable. Using items for six 
of the mathematics concepts Dresented on 
mark sense type cards, a study was conducted 
in which one group of subjects responded to 
the items arranged in the same random order 
(over 72 items for the six concepts) common 
to all subjects. The second group of subjects 
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CONCEPTS 



Area 1 



1 2 



10 



Area 2 
11 12 20 



Area 3 



21 22 



30 



Total Score 
for Tasks 



TASKS 



12 



Total Score 
for Concepts 



Fig* 1 . Item matrix for each individual. 
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responded to the items arranged in a random 
order (over 72 items for the six concepts) 
which was i unique one for each subject of 
the group. No significant differences in test 
score were found between the subjects receiv- 
ing a common random order and those receiving 
a unique random order. 

Tryouts of the items for item analysis and 
revision purposes were conducted using a sin- 
gle random order over the items for six concepts 
contained in a test booklet. This constituted 
a "test" of 72 items which could readily be 
administered in 1 hour. The tryouts were con- 
ducted during December, 1969, and January, 
1970, with fifth grade students in the Madison, 
West Ailis, and Fond du Lac, Wisconsin school 
systems. All of these school systems used 
the Greater, Cleveland Mathematics Program . 
Approximately 100 students responded to each 
"test." Madison students responded to the 
items for six of the concepts, West Allis the 
items for 12, and Fond du Lac students the 
items for 12 of the concepts. 

The tryout data were subjected to the 
Generalized Item Analysis Program (GITAP) 
(Baker, 1969), the output of which provides • 
the proportion responding, item -criterion bi- 
serial correlation, X50 (point on the criterion 
scale corresponding to the median of the item 
characteristic curve), and P .(the reciprocal 
of the standard deviation of the item character- 
istic curve which is a measure of the discrim- 
inating power of the item) for each possible 



choice for each item as well as summary 
descriptive statistics for the total test. It 
also gives the Hoyt reliability for the total 
test and the standard error of measurement. 

As discussed earlier, the design for 
these mathematics achievement items is one 
in which the concepts and tasks are complete- 
ly crossed. Since there are no item analysis 
procedures available for completely crossed 
designs, the data were analyzed in each of 
the two possible ways — each item as part of 
the appropriate concept score and as part of 
the appropriate task score. This raises ques- 
tions as to the interpretation of such results. 
The main referents used for interpreting the 
results and as a basis for making item revi- 
sions were the results obtained from the anal- 
yses of the concept scores. The tasks were 
fixed and thus any arbitrary decisions were 
made in regard to appropriate content for 
incorrect choices, etc. Usual standards for 
item indices were not strictly adhered to, as 
a unique design for item analysis was being 
used and a major objective of the project is 
to study the dimensionality of the concepts 
and of the tasks . If high discrimination in- 
dices were demanded, the dimensionality 
might have been affected by making the items 
more homogeneous. Also, no attempt was 
made to manipulate the difficulty level of the 
items , since another objective of the project 
is to determine if any differential levels of 
difficulty, or complexity, exist in the concepts 
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and in the tasks. Therefore, the item analysis 
results were used as a very general guide to 
help in determining whether there were "hid- 
den" weaknesses, clues, and/or incongruities 
in the items and, in an even more general sense, 
to show that what we were attempting to do 
was possible — sufficiently reliable concept and 
task scores could be obtained when using this 
completely crossed design. 

The revised items can be found in "Items 
to Test Level of Attainment of Mathematics Con- 
cepts by Intermediate-Grade Children" (Rom- 
berg & Steitz, in press). 



Subjects 

The mathematics items were administered 
to 196 girls who had just completed the fifth 
grade during early summer, 1970, and to 195 
boys who were just beginning the sixth grade 
during the fall of 1970 in the public school 
system of Madison, Wisconsin. The students 
were randomly selected from the population of 
all such girls and from the population of all 
such boys. The Madison Public School System 
made available the information concerning the 
populations and used their computing facilities 
to designate the random sample for the girls* 

Initially, a random sample of 300 girls 
was drawn. Letters were sent to the parents 
of these students explaining the purpose and 
details of the testing, and inviting their 
daughter to participate in the testing program. 
A stamped and addressed postcard was en- 
closed which the parents were asked to com- 
plete and return indicating whether or not they 
were willing to allow their daughter to parti- 
cipate. One hundred and two yes responses 
and 25 no responses were obtained from the 
cards returned. Those parents who had not 
returned the card by a specified date were 
phoned. An additional 46 yes and 61 no re- 
sponses were obtained by phone. Since this 
total of yes responses did not give as many 
subjects as were desired, an additional sam- 
ple of 150 girls was drawn at random. From 
this sample, 56 yes and 30 no responses were 
obtained by card. Thus, of the total sample 
of 450 students, 203 yes and 116 no responses 
were received; seven students did not com- 
plete the testing, which resulted in a total 
of 196 girls tested. These students were paid 
$7.50 for participating . 

A random sample of 756 boys was drawn 
and letters were sent. 8y mail, 420 yes and 
87 no responses were obtained. Thirty-eight 
of the subjects did not complete the testing, 
resulting in 382 boys tested. Of this total, 



195 boys completed the mathematics and so- 
cial studies items; the others responded to 
language arts and science items. As with the 
girls, the boys who completed the testing pro- 
gram were paid $7.50. 

Since the participation of all students 
comprising the random sample was impossible 
to attain, test and IQ data were obtained from 
the files of fhe Madison Public School System 
for both the school population and those par- 
ticipating students for whom the information 
was available. Table 2 includes the summary 
statistics for the population of fifth grade stu- 
dents in the public school system of the city 
of Madison during the school year of 1969-70, 
and for the boys and the girls who comprised 
the tested samples for the mathematics items. 
The IQ scores were obtained in a fall, 1968, 
administration of the Lorge-Thorndike Intelli- 
gence Test when the subjects were fourth 
graders; and the scores on the Iowa Tests of 
Basic Skills, given in grade equivalent scores, 
were obtained in the fall of 1969 when the sub- 
jects were fifth graders . 

Data on fathers 1 occupations were collect- 
ed from the students using the Master Occupa- 
tional Code of the United States 8ureau of the 
Census. These data were tabulated and are 
presented in Table 3. 



Data Collection 

The data for the girls were collected in 
two different schools during five 2-hour daily 
sessions for one week. Subjects could choose 
ike week and the school in which they wanted 
torepoit lor testing. A one-week session was 
held at Hawthorne School from June 22 to 
June 26, and a one-week session was held at 
Hoyt School from July 13 to July 17. Each 
2-hour session consisted of a 72-item "test" 
composed of mathematics items, a 72-item 
"test" composed of social studies items, and 
an activity break between the two of approxi- 
mately 1/2 hour. The mathematics and the 
social studies items were given first on alter- 
nate days . 

The data for the boys were collected in 
a similar manner from mid-October to mid- 
November. Ninety of the boys who were 
attending Middle School for sixth grade were 
tested after school for five consecutive days 
in one week atSchenk, Sennett, and Orchard 
Ridge schools; those 105 elementary school 
boys who completed the testing (who were 
attending a Junior High School) were tested 
on three consecutive Saturday mornings at 
Franklin, Longfellow, and Randall schools. 
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Table 2 

Test Data for the Population and Samples 



of Madison, Wis. Fifth Grades 






Population 


Boys 


Girls 


Lorge-Thomdike Intelligence Test 


X 


106.60 


105.95 


112.02 




s 




14.74 


12.15 




N 


2605 


169 


191 


Iowa Tests of Basic Skills 










Vocabulary 


, 

X 


5.53 


5.60 


5.75 




c 




1 

1 . O «J 


l id 

1 * On 




N 


2520 


181 


187 •■• 


Reading Comprehension 


v 

A 


C A A 

5.44 


C AO 

5. 43 


C OA 

5 • 84 




s 




1 . 60 


1 . 46 




N 


25 20 


181 


187 


Language okiiis 


X 


C OA 


5. 07 


r "7/1 

5 . 74 




s 




1.43 


1.29 




N 


2520 


181 


187 


Work-Study Skills 


X 


5.46 


5.50 


5.70 




s 




1.31 


1.13 




N 


2520 


181 


187 


Arithmetic Skills 


X 


5.05 


5.08 


5.24 




s 




1.04 


.97 




N 


25 20 


179 


187 


Composite 


X 


5.35 


5.34 


5.65 




s 




1.22 


1.10 




N 


25 20 


179 


185 



The mathematics items were arranged in 
five 72 item "tests." The order of the items 
was assigned randomly over the 360 items. Two 
different random orders were used to collect 
the data: one for each school for the girls and 
, one for each type of school for the boys . 

The items were arranged in five test book- 
lets according to the random order. The stu- 
dents responded to the items by marking their 
chosen response directly on an answer sheet. 
The answer sheets were read by machine and 
the responses punched onto data cards. 

Treatment of the Data 



The treatment of the data consisted of 
two main procedures: reliability estimation 
and item analysis. The data were analyzed 
separately for each sex group. Hoyt analysis 
of variance reliability estimates were obtained 
for each of the 30 concept scores and each of 
the 12 task scores for each group studied. 
jMeans and standard deviations for each of 



the scores were also computed. 

Item analyses using the GITAP program 
(Baker, 1969) were obtained for each of the 
items as a part of two different scores: an 
appropriate concept score and an appropriate 
task score. This program provides proportion 
responding, item-criterion biserial correla- 
tion, X50, and '3 statistics for each choice 
of each item. The proportion of students who 
respond correctly to an item is an index of 
the difficulty level of that item. The greater 
the value of the difficulty index, the easier 
the item. The biserial correlation coefficient 
is an index of the discriminating ability of 
the item choice. For these analyses the cri- 
terion ability used was total concept or total 
task score. X50 is the point on the criterion 
scale 1 given in standard deviation units, 
corresponding to the median of the item char- 
acteristic curve. It is the point at which sub- 
jects with that score have a 50-50 chance of 
choosing that response . $ is the reciprocal 
of the standard deviation of the item charac- 
teristic curve at the X50 point. It is an index 
of the discrimination power of the item. 
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Table 3 

Distribution of Fathers* Occupations 



Occupation 



Boys Girls 



PROFESSIONAL, TECHNICAL, AND KINDRED WORKERS 

00. Accountant 

01 . Architect 

02. Dentist 

03. Engineer 

04. Lawyer, Judge 

05. Clergyman 

06. Doctor 

07. Nurse 

08. Teacher, Professor 

09. Other Professional 

FARMER 

11. Farmer 

MANAGERS, OFFICIALS, PROPRIETORS, EXCEPT FARM 

21 . Owner of Business 

22. Manager, Official 

CLERICAL AND KINDRED WORKERS 

31 . Bookkeeper 

32. Receptionist 

39. Other Clerical and Kindred Workers 

SALES WORKERS 
49. Salesman 

CRAFTSMEN, FOREMEN, AND KINDRED WORKERS (SKILLED WORKERS) 

51. Craftsman, Skilled Worker 

52. Foreman 

53. Armed Services - Officer 

54. Armed Services - Enlisted Man 

OPERATIVES AND KINDRED WORKERS (SEMI-SKILLED WORKERS) 

61. Truck Driver 

62. Operative in Factory 

69. Other Operative and Kindred Workers 

PRIVATE HOUSEHOLD AND SERVICE WORKERS 

71 . Fireman 

72. Policeman 

73. Other Protective Service Worker 

74. Practical Nurse, Nurse's Aide 

75. Private Household Workers 
79 . Other Service Workers 

81 . Non-Farm Laborer 

82. Farm Laborer 

91 . Not presently in labor force 
99. Not ascertained 



2 
1 

5 
4 



18 
16 



2 
12 



20 

31 
2 
1 
1 

10 
9 
18 

1 
1 

2 
1 
14 



4 
13 



2 
1 

8 

3 



21 
22 



11 

5 

15 

17 
4 
1 

5 
8 

23 

3 
1 

13 



8 

22 
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Results and Discussion 



The means, standard deviations, and 
Hoyt reliability estimates obtained for the 
data collected during summer and fall of 1970 . 
using the revised items are presented, sepa- 
rately for boys and girls, for total concept 
and total task scores. Also included in this 
section are a presentation and discussion of 
the item indices obtained for the correct choice 
of each item using both concept and task cri- 
terion scores. 

Reliability Estimates and 
Test Statistics 

Table 4 contains the means, standard 
deviations , and Hoyt reliability estimates 
obtained for the data collected during summer 
and fall, 1970, using the revised items for 
total concept and total task scores. The data 
were analyzed separately for the 195 boys 
and the 196 girls. The key for the task scores 
appears on the table; the key for the concept 
scores is given by the numbers in parentheses 
in the list of concepts presented in Table 1 . 
For example, concept number 1 is Disjoint 
Sets, number 2 is Empty Sets, number 3 is 
Equal Sets, etc. In general, the concept 
scores consist of 12 items each, and the 
task scores of 30 items each • Exceptions 
to this are noted in twc of the footnotes . 

The mean scores for boys are generally 
lower than are the mean scores for girls • No 
conclusions can be drawn from this, however, 
as the data for the girls were collected in 
early summer shortly after the school year of 
their fi*th grade had ended and the data for " 
the boys were collected in the fall shortly 
after tho school year of their sixth grade had 
begun • Thus, it cannot be determined what, 
if any, of this difference is due to a sex dif- 
ference and what is due to a time difference 
and possible forgetting factor. It should also 
be noted that ! :he scores for Concepts 8, 15, 




and 22 are based on one more item for boys than 
they are for girls; Concept 15 has 11 and 10 
items for boys and girls respectively, Con- 
cepts 8 and 22 have 11 and 12 items respec- 
tively making up the total score. The scores 
for Tasks 1 , 2, and 9 are made up of 30 items 
for boys but only 29 for girls . 

The standard deviations and Hoyt reliabil- 
ity estimates are generally higher for boys than 
they are for girls . 

The reliability estimates are sufficiently 
high to warrant study of the dimensionality of 
these selected mathematics concepts and the 
tasks when using mathematics content. This 
is a major objective of the CAA Project and is 
the main purpose for developing these items 
to measure mathematics concept attainment. 

As was mentioned earlier, the subject mat- 
ter specialists categorized the identified math- 
ematics concepts into three major areas: Sets, 
Division, and Expressing Relationships. This 
was done on a theoretical basis. The data 
could be, and were, analyzed by area for task 
scores . Instead of a single total task score 
consisting of the score for that task type item 
for each of the 30 concepts, three different 
task scores were obtained for each of the 1 2 
tasks , consisting of the score for that task 
type item for each of the 10 concepts within 
a single area. The mean, standard deviation, 
and Hoyt reliability estimate for each of these 
36 scores, 3 areas by 12 tasks, v-«re obtained. 
Table 5 contains the reliability estimates ob- 
tained for task scores by area and for the total 
across all 30 of the concepts. Spearman -Brown 
estimates for tripled test lengths (some are 
given at this bottom of Table 5 for comparison 
purposes) indicate that the area distinctions 
are not important owes; the reliability estimates 
for the total task scores are : abOut what would 
be expected from tripling the length of the test 
when the single area reliability estimates are 
of the magnitude that were obtained. Also, pre- 
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Table 4 

Means, Standard Deviations, and Reliabilities for 
Mathematics Concept and Task Scorss: Boys and Girls 



No. 



Mean 
Boys Girls 



Conceptsa»b 

Standard Dev. Hoyt Rel. 
Boys Girls Boys Girls 



Boys 



Mean 



Girls 



Tasks 0 



Standard Dev. Hoyt Rel. 
Boys Girls Boys Girls 



1 


6.51 


7.18 


2.20 


2.01 


.48 


. .42 


18.89 


19.34* 


5.41 


4.13 


.81 


• 71 


2 


7.. 09 


8.06 


2.66 


2.32 


.67 


.61 


18.14 


19.45* 


5.72 


5.12 


.82 


.80 


3 


6.15 


7.25 


2.61 


2.41 


.64 


.62 


20.16 


22.25 


5.04 


4.00 


.80 


.73 


4 


6.99 


7.42 


2.34 


2.34 


.55 


.60 


20.42 


22.79 


5.03 


4.00 


.79 


.73 


5 


7.51 


8.34 


2.33 


2.01 


.61 


.49 


18.52 


21.05 


5.56 


4.34 


.82 


. 75 


6 


6.95+ 


7.43+ 


1.99 


1.80 


.49 


.41 


16.79 


19.44 


6.21 


5.56 


• 84 


.82 


7 


5.48 


6.36 


2.54 


2.48 


.62 


.62 


12.63 


12.51 


4.93 


4.61 


.73 


.70 


8 


6.82 


6.59+ 


2.49 


2.21 


.62 


.56 


16.92 


20.40 


6.30 


5.86 


• 85 


.85 


9 


5.89 


6.10 


2.62 


2.49 


.63 


.59 


16.94 


18.54* 


6.02 


5.37 


.83 


.81 


10 


6.63+ 


7.43+ 


2.63 


2.04 


.71 


.58 


15.28 


17.16 


5.33 


5.11 


.78 


.78 


11 


6.68 


8.10 


3.00 


2.53 


.74 


.66 


11.85** 


13.65** 


4.49 


3.73 


.77 


• 68 


12 


7.18 


8.57 


2.66 


2.24 


.67 


.61 


12.25 


13.51 


4.13 


3.99 


.62 


.58 


13 


5.02 


5.48 


2.58 


2.55 


.62 


.60 














14 


7.69 


8.87 


2.61 


2.47 


.69 


.73 














15 


7.14+ 


7.28++ 


2.51 


2.14 


.69 


.68 














16 


7.33+ 


7.64+ 


2.49 


2.27 


.71 


.66 














17 


6.26+ 


7.19+ 


2.39 


2.28 


.62 


.63 














18 


6.79 


7.12 


2.94 


2,90 


• 75 


.76 














19 


6.20 


6.74 


2.69 


2.51 


.67 


.64 














20 


6.50 


7.65 


2.55 


2.45 


.64 


• .64 














21 


5.66+ 


5.87+ 


2.19 


2.09 


.53 . 


.52 














22 


7.49 


7.92+ 


2.29 


1.71 


.58 


.42 














23 


6.43 


7.11 


2.31 


2.18 


.57 


.55 














24 


5.21+ 


6.24+ 


2.31 


2.29 


.58 


.60 














25 


6.65 


7.97 


2.67 


2.39 


.65 


.62 














26 


5.65 


6.32 


2.58 


2.16 


.65 


.50 














27 . 


6.>S 


7.41 


2.44 


2.33 


.61 


.61 














28 


6.83 


7.42 


2.58 


2.02 


.65 


.44 














29 


7.16 


7.84 


2.38 


2.16 


.59 


.55 














30 


8.55 


9.21 


2.52 


1.93 


.71 


.64 















Key for Tasks: 1 Given name of attribute, select example. 



2 Given example of attribute, select name. 

3 Given name of concept, select example. 

4 Given name of concept, select nonexample. 

5 Given example of concept, select name. 

6 Given concept,- select relevant attribute. 

7 Given concept, select irrelevant attribute. 

8 Given definition of concept, select name. 

9 Given name of concept, select definition. 

10 Given concept, select supraordinate concept. 

11 Given concept, select subordinate concept. 

12 Given two concepts , select relationship. 



-The key for the concepts is given by the numbers in parentheses in the list of concepts (Table 1) . 
h Scores consist of 12 items each except those marked as follows: + has 11 and ++ has 10. 
c Scores consist of 30 items each except those marked as follows: * has 29 and ** has 23. 
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Table 5 

Reliability Estimates for Task Scores by Area and Total for Girls 



Area 



Task 


Set Theorv a 


Division 9 


fixpres sing 
Relationships 9 


Total b 


1 


. .36 


.45+ 


.51 


.71* 


2 


.53+ 


. .61 


.57 


.80* 


3 


.46 


.53 


.49 


.73 


4 


.41 


;55 


.49 


.73 


5 


.49 


.59 


.49 


.75 


6 


.60 


.65 


.58 


.82 


7 


.42 


.54 


.33 


.70 


8 


.56 


.73 


.65 


.85 


9 


.62 


.63 


.50+ 


.81* 


10 


.56 


.66 


.40 


.78 


11 


.29++ 


.45+++ 


.48++ 


.68** 


12 


.26 


.41 


.19 


.58 



a Scores consist of 10 items each except those marked as follows: + has 9, ++ 
has 8, and +++ has 7. 

b Scores consist of 30 items e^ch except those marked as follows: * has 29 and 
** has 23. 

For comparison, these are the Spearman-Brown estimates for tripled test length: 
Original Estimated 



.40 
.50 
.60 
.65 
.70 



.n 

.75 
.82 
.85 
.88 



liminary factor results indicate that the area 
distinctions are not important ones. The fac- 
tor analyses of these data will be reported in 
a later paper. 



Item Indices 

Table 6 contains the item indices obtained, 
separately for boys and girls, based on both 
concept and task criterion scores. The indices 
included are proportion correct (this frequently 
is called difficulty or P) , item-criterion bi- 
serial correlation, Xso» and B . They are 
given for the correct choice only* The key 
for the concepts is given by the numbers in 
parentheses in the list of concepts given in 
Table 1 (it is the same as for Table 4) and the 
key for the tasks is given in Table 4. The 
item number has no special meaning; it is a 
coding number and was included- in the table 



as an organizational aid. Decimals have 
been omitted from the proportion correct and 
the biserial correlation columns. Note that 
proportion correct is the same whether analyzed 
using the concept criterion score or the task 
criterion score; hence, there is only one col- 
umn each for boys and girls. The other item 
indices differ according to criterion score 
used. When an item was missing from the 
data collected, the appropriate row was left 
blank except for the identifying numbers, e.g. , 
/Item 203 for Concept 17 - Task 11. Three 
items, Nos. 71, 86, and 261, were missing 
from the data collected for the girls but were 
available for the boys; in this case only the 
columns for the girls are blank. There are a 
few instances where there is a blank in an 
X50 column. If 6 is very low, the X50 be- 
comes essentially meaningless; thus, X50 
is not included if the 6 value is less than 
.10. 
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If desired, the items that make up a cri- 
terion score can be separated out. This is 
easy to do for a concept; the items composing 
the criterion score are simply the 12 given in 
order consisting of one of each task type. 
For example, the items composing the criteri- 
on score for Concept 3 are numbered 25 through 
36. The items composing the criterion score 
for a task are those with the same task number 
for each of the concepts; for example, the 
items composing the criterion score for Task 1 
are numbered 1, 13, 25, 37, etc., with the 
last one being number 349. 

As was evident from the means of the 
total scores, and as can be seen from the two 
difficulty indices given for the items (propor- 
tion correct and X 50 ), the items, in general, 
were more difficult for the boys than for the 
girls. There 3 not a one-to-one correspon- 
dence for each item, however; there are some 
exceptions, since some items were more diffi- 
cult for the girls and some were about the 
same. As was pointed out earlier, however, 
no conclusions can be drawn from this because 
the data for the girls were collected in early 
summer shortly after the end of their fifth 
grade school year and the data for the boys 
were collected in the fall shortly after their 
sixth grade year had begun. The difficulty indices 
obtained indicate that these items are of appro- 
priate difficulty levels for these subjects. 

It seems clear from looking at Table 6 
thatXso gives more precise information about 
the difficulty level of an item when that same 
item is a part of each of two criterion scores. 
The proportion correct remains the same for 
both of the criterion scores. This index tells 
how many subjects responded to the correct 
answer for an item but it says nothing about 
their ability level as measured by a particular 
criterion score— total concept score or total 
task score in this case. The item difficulty 
index, X501 gives (in standard deviation units) 
the criterion score at which a subject would 
have a 50-50 chance of getting the item cor- 
rect. For example, an X50 value of 1 . 20 for 
an item indicates that subjects with a criteri- 
on score 1 .20 standard deviation units above 
the mean have a 50% chance of answelflhg that 
item correctly. Subjects with a criterion score 
higher than this would have a greater chance 
of answering that item correctly, and subjects 
with a criterion score lower than this would 
have a lesser chance. Likewise, anXso value 
of -1 . 20 means that subjects with a criterion 
score 1 . 20 standard deviation units below the 
mean would havie a 50% chance of getting that 
item correct; for a higher score the chance 
would be greater, and for a lower score the 



chance would be less. Knowing bothX'50 and 
6 for an item allows one to readily determine 
the probability of answering an item correctly 
for any point on the criterion scale (Baker, 
1964) . It may be pbinted out that when P ■ .50, 
X50 - .00; when P is greater than .50 then X50 
will be negative and, for a certain P, the 
higher the 6 value the closer to zero will be 
the X50 value. This can be seen from inspect- 
ing Table 6. For example, for Item 1 the 3 is 
higher for the concept score than it is for the 
\ :sk score for both boys and girls; similarly 
fcr both boys and girls, the X50 value is closer 
to zero for the concept score than it is for the 
task score. For P less than .50, theXso will 
be positive, and again, for a certain P, the 
higher the 6 value the closer to zero will be 
theX 50 value. See item 7 for an illustration 
of this . 

The two item discrimination indices, bi- 
serial correlation and 6 , are more closely 
related since 6 is computed as a function of 
the biserial correlation (Baker, 1969). They 
are not linearly related, however. From .00 to 
about .30 (absolute) they are very nearly the 
same; beyond this, fl begins to increase 
quite rapidly in magnitude. It may be pointed 
out that 3 is always equal to or greater (ab- 
solute) than the biserial correlation. As a 
general rule, .30 is often used as a lower 
cutting ppint for a desirable biserial correla- 
tion or 6 . For a total score composed of 
relatively few items, as is the concept score, 
a much higher minimum would be desirable. 

As can be seen from Table 6, most of the 
mathematics items have desirable biserial 
correlations and gs when the item is both a 
part of a concept criterion score and a task 
criterion score. The most obvious thing is 
that the 3s are higher, with a few exceDtions, 
when the item is a part of a concept criterion 
score than when it is a part of a task criterion 
score. This is to be expected since the con- 
cept score consists of considerably fewer items 
than does the task score — 1 2 items for most 
concept scores and 30 items for most task scores. 
The item-criterion biserial correlation is a part- 
whole correlation, with the criterion the total 
score of which the item is a part, and the fewer 
the number of items the greater should be the 
correlation of that item with the total score of 
which it is a part. Since $ is computed as 
a function of the biserial correlation, it is 
affected in the same manner. There does not 
seem to be a consistent pattern in the magni- 
tude of the 8s for the boys as compared 
with the girls . For some of the. items, the 
6s' are considerably higher for the boys and 
for some of them they are considerably higher 



for the girls. For the tryouts of the items, 
data for both boys and girls were analyzed 
together. If the data for boys and girls were 
pooled and item analyzed, the. 3 values 
would probably increase for most of the items. 

As was discusspd earlier, these item 
indices were obtained by performing conven- 
tional item analyses on two different types of 
scores— one for concept criterion scores and 
one for task criterion scores. This was neces- 
sitated by the lack of item analysis procedures 
appropriate for use with data collected using 



a completely crossed design to build the items. 
It is not known how the item indices would be 
affected if procedures were available to com- 
pute them simultaneously taking into account 
the effects of the crossed design. A guess 
would be that discrimination indices would be 
affected more than would difficulty indices, if 
there were an effect. It Is plausible to expect 
that there may be some concept -task inter- 
actions which cannot be, at least readily, 
ascertained by doing a conventional item 
analysis on the two types of scores. 
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IV 

Summary and Conclusions 



The primary objective of the project en- 
titled "A Structure of Concept Attainment Abil- 
ities" is to formulate one or more models or 
structures of concept attainment abilities , 
and to assess their consistency with actual 
data. One of the major steps for attaining 
this primary objective was taken to be the 
development of tests to measure achievement 
of selected language arts, mathematics, sci- 
ence, and social studies concepts appropriate 
at the fourth grade level. This paper describes 
the test development efforts and presents the 
item and total score statistics obtained using 
the revised items developed for measuring 
achievement of selected concepts in mathe- 
matics. 

Subject matter specialists identified sin- 
gle- or compound-word classificatory concepts 
for three major areas, and randomly selected 
10 from each area to be studied . These 30 
selected concepts were,then analyzed. Twelve 
items for each concept were developed; one 
for each of the first 12 tasks of "A Schema 
for Testing Level of Concept Mastery" (Ftayer, 
Fredrick, & Klausmeier, 1969) , 

The items that were developed were admin- 
istered during early summer of 1970 to 196 girls 
who had just completed the fifth grade and dur- 
ing the fall of 1970 to 195 boys who had just 
begun the sixth grade. These data were item 
analyzed, separately for boys and for girls , 
using the GITAP program (Baker, 1969), 

The means, standard deviations, and Hoyt 
reliability estimates obtained are presented and 
discussed for totakconcept and total task scores. 
Four different item indices — proportion correct, 
item-criterion biserial correlation, X50, and 
'3— -obtained for each item based on each pf 
two criterion scores, appropriate total concept 
score and appropriate total task score, are 



presented and discussed. 



Conclusions 

The major conclusions drawn are: 

1. The reliability estimates obtained for 
both total concept scores and total 
task scores are sufficiently high to 
warrant study of the dimensionality of 
these selected mathematics concepts 
and the dimensionality of the tasks 
when using mathematics content. 

2. The three area distinctions seem not 
to be important ones. 

3. The difficulty item indices obtained 
indicate that these items are of appro- 
priate difficulty levels for these sub- 
jects. 

4. Most of the items have desirable levels 
of discrimination indices when the item 
is both a part of a concept criterion 
score and a task criterion score. 



Recommendation 

The completely crossed design used to 
construct these achievement tests is a very 
interesting one . This type of design might well 
be used more often in the future. It would be 
highly desirable to have available item analy- 
sis procedures that are appropriate for analyzing 
such crossed designs. At the present such a 
methodology is not known. 
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